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SYSTEM AND MF.THQD FOR PARSING A DOCTTTvTIhNT 
Background of the Invention 

This invention relates generally to a system and method for processing a document and in 
particular to a system and method for identifying a plurality of phrases within the document 
which indicate the context of the document. 

Various factors have contributed to the extensive storage and retrieval of textual data 
information using computer databases. A dramatic increase in the storage capacity of hard drives 
coupled with a decrease in the cost of computer hard drives, and increases in the transmission 
speed of computer communications have been factors. In addition, the increased processing 
speed of computers and the expansion of computer communications networks, such as a bulletin 
board or the Internet, have been factors. People therefore have access to the large amounts of 
textual data stored in these databases. However, although the technology facilitates the storage 
of and the access to the large amounts of textual data, there are new problems that have been 
created by the large amount of textual data that is now available. 

In particular, a person trying to access textual data in a computer database having a large 
amount of data needs a system for analyzing the data in order to retrieve the desired information 
quickly and efficiently without retrieving extraneous information. In addition, the user of the 
system needs an efficient system for condensing each large document into a plurality of phrases 
(one or more words) which characterize the document so that the user of the system can 
understand the document without actually viewing the entire document. A system for 
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condensing each document into a plurality of key phrases is known as a parsing system or a 
parser. 

In one typical parser, the parser attempts to identify phrases which are repeated often 
within the document and identifies those phrases as being key phrases which characterize the 
5 document. The problem with such a system is that it is very slow since it must count the 

repetitions of each phrase in the document. It also requires a large amount of memory. As the 
amount of data to be parsed increases, the slow speed of this parser becomes unacceptable. 
Another typical parser performs a three step process to identify the key phrases. First, each word 
g in the document is assigned a tag based on the part of speech of the word (i.e., noun, adjective, 
J1j> adverb, verb, etc.) and certain parts of speech, such as an article or an adjective, may be removed 
u from the list of phrases which characterizes the document. Next, one or more sequences of 

words (templates) may be used to identify and remove phrases which do not add any 
|=f understanding to the document. Finally, any phrase which is an appropriate part of speech and 
j 3 does not fall within one of the templates is accepted as a key phrase which characterizes the 
15 document. This conventional parser, however, is also slow which is unacceptable as the amount 
of data to be parsed increases. 

In all of these conventional parser systems, the parser attempts to break the document 
down into smaller pieces based on the characteristics (frequency of repetition or part of speech) 
of the particular words in the document. The problem is that language generally is not that easily 
20 classified and therefore the conventional parser does not accurately parse the document or 
requires a large amount of time to parse the document. In addition, the conventional parser 
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systems are very slow because they all attempt to use complex characteristics of the language as 
a method for parsing the key phrases out of the document. These problems with conventional 
parsers becomes more severe as the number of documents which must be parsed increases. 
Today, the number of documents which must be parsed is steadily increasing at a tremendous 
5 rate due to, among other things, the Internet and the World Wide Web. Therefore, these 

conventional parsers are not acceptable. Thus, it is desirable to provide a parsing system and 
method which solves the above problems and limitations with conventional parsing systems and 
it is to this end that the present invention is directed. 

m Summary of the Invention 

P A parser system and method in accordance with the invention is provided in which the 

l - ^ break characters within a sentence or a paragraph are used to parse the document into a plurality 
JIJ of key phrases. The parser system in accordance with the invention is very fast and does not 
^ sacrifice much accuracy for the speed. The break characters within the document may include 
D punctuation marks, certain stop words and certain types of words such as verbs and articles. The 
1 5 parser system may include a buffer which receives one or more words before it receives a break 
character. When the buffer receives a break character, the parser may determine whether the 
phrase before the break character is saved based on the type of break character. In particular, if 
the break character is a punctuation mark, the parser may keep the one or more words before the 
break character as a key phrase. If the break character is another type of character, the phrase 
20 before the break character may or may not be saved. Once the fate of the phrase has been 
determined, the buffer is flushed and the next sequence of one or more words is read into the 
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buffer so that it may also be parsed. In this manner, a plurality of phrases in the document may 
be rapidly extracted from the document based on the break characters within the sentences and 
paragraphs of the document. 

The parser system in accordance with the invention may also be used to parse various 
5 different foreign languages into phrases provided that the rules database includes rules that are 
applicable to the particular foreign language. In particular, each foreign language may have 
slightly different syntax or characters (in the case of Asian languages or Arabic, for example) so 
that the rules must reflect those syntactic and character differences. 

^ Thus, in accordance with the invention, a system for parsing a piece of text into one or 

If) more phrases which characterize the document is provided. The system comprises a buffer for 

in reading one or more words from the piece of text into the buffer and a parser for identifying a 

M phrase contained in the buffer, the phrase being a sequence of two or more words in between 

!™ break characters. The parser further comprises means for determining the type of break character 

p. that follows the identified phrase and means for saving a key phrase from the buffer based on the 

15 determined type of break character. The key phrases are stored in a database. 

In accordance with another aspect of the invention, the parsing method may include a 
two-pass process wherein phrases are extracted from the piece of text as described above. 
During the second pass, all of the occurrences of the extracted phrases in the piece of text are 
retrieved. The second pass ensures that phrases that were not extracted at each location in the 
20 piece of text may still be retrieved. 
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Brief Description of the Drawings 

Figure 1 is a block diagram of a text processing system; 

Figure 2 is a block diagram of a parsing system in accordance with the invention; 

Figure 3 A is a flowchart illustrating a two-pass parsing method in accordance with the 
5 invention; 

Figure 3B is a flowchart illustrating more details of the extracting phrases step of the 
n parsing method shown in Figure 3A; 

m Figure 4 is an example of a document to be parsed by the parsing system in accordance 

p with the invention; 

Jl p Figures 5 A - 5L are diagrams illustrating the operation of the parsing buffer in 

'Q accordance with the invention on the document shown in Figure 4; 

Q Figure 6 is a diagram illustrating a piece of Japanese text; and 

Figure 7 is a diagram illustrating the Japanese phrases extracted from the Japanese text of 
Figure 6 in accordance with the invention. 

15 Detailed Description of a Preferred Embodiment 

The invention is particularly applicable to a system for parsing English language 
documents and it is in this context that the invention will be described. It will be appreciated, 
however, that the system and method in accordance with the invention has greater utility, such as 
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to other languages and to various different pieces of textual data. To better understand the 
invention, a text processing system will now be described. 

Figure 1 is a block diagram of a text processing system 10. The text processing system 
10 may include a parser system 12, a clusterizer 14, a map generator 16 and a database (DB) 18. 
5 The text processing system may receive one or more pieces of text , such as stories, press 

releases or documents, and may generate a map graphically showing the relationships between 
the key phrases in the document. Each piece of text may be received by the parser system 12 
which processes each piece of incoming text and generates one or more key phrases for each 
;^ piece of text which characterizes the piece of text. The key phrases may be stored in the database 
JO 18. The details about the parser system will be described below with reference to Figures 2- 5. 
y± Once the key phrases are extracted from each piece of text, the clusterizer 14 may generate one 
* or more clusters of the key phrases based on the relationships between the phrases. The clusters 
j=f generated may also be stored in the database 18. The map generator 16 may use the generated 
!«! clusters for the pieces of text in the database in order to generate a graphical map showing the 
15 relationships of the key phrases within the various pieces of text in the database to each other so 
that a user of the system may easily search through the database by viewing the key phrases of 
the pieces of text. More details about the clusterizer and map generator are disclosed in co- 
pending U.S. patent application serial no. 08/801,970 which is owned by the assignee of the 
present invention and is incorporated herein by reference. The text processing system may be 
20 implemented in a variety of manners including a client/server type computer system in which the 
client computers access the server via a public computer network, such as the Internet. The 
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parser, the clusterizer and the map generator may be software applications being executed by a 
central processing unit (not shown) of the text processing system 10. Now, the parser system 12 
in accordance with the invention will be described in more detail. 

Figure 2 is a block diagram of the parsing system 12 in accordance with the invention. 
5 The parsing system 12 may include a buffer 20, a parser 22 and a rules database (rules DB) 24. 
The buffer may store one or more words of the incoming piece of text, which may be a 
document, which are analyzed by the parser 22 using the rules contained in the rules DB 24. The 
output of the parser system 12 is one or more phrases (each phrase containing one or more 
^ words) which characterize the document being parsed. In particular, the parser may separate 
||) phrases in the document based on break characters within the document in accordance with the 
|=s= invention. In more detail, one or more words may be read into the buffer from the document 
until a break character is identified. Thus, the parser system 12 identifies phrases which are 
r! between break characters. Then, based on the type of break character, the phrase may be saved 
H as a key phrase or deleted. The parser system 12, for example, may be implemented as one or 
1 5 more pieces of software being executed by a microprocessor (not shown) of a server computer 
which may be accessed by a plurality of client computers over a computer network, such as the 
Internet, a local area network or a wide area network. The parser 22 advantageously rapidly 
extracts key phrases from a piece of text using break characters. The break characters in 
accordance with the invention will now be described. 

20 The break characters may include an explicit break, such as a punctuation mark, numbers, 

words containing numbers, and stop words. The stop words may be further classified as soft stop 
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words or a hard stop words. Each of these different break characters will now be described. The 
explicit break characters may include various punctuation symbols, such as a period, a comma, a 
semicolon, a colon, an exclamation point, right or left parenthesis, left or right square brackets, 
left or right curly braces, a return character or a line feed character. The stop characters may be a 
5 generated list or it may include a slash (/) and an ampersand symbol (@). A separator may be 
defined as digits, letters, foreign characters, break characters, apostrophes, dashes and other stop 
characters. The various words in a piece of text may be categorized as articles, connectors, hard 
and soft stop characters, linguistic indicators, a syntactic categories such as nouns, verbs, 
y irregular verbs, adjectives and adverbs. 

ft In parsing the characters in the piece of text, separators may always be added to a phrase. 

L A apostrophe or dash at the beginning of a word is treated as a break character (see below), an 
apostrophe or dash at the end of a word is also treated as a break character and a word with an 
!=f apostrophe or dash in the middle of the word is added to the phrase in the buffer. All stop 
U characters and breaks are treated as stop characters and breaks as described below. At the word 
15 level of parsing, proper nouns are retained by testing for an upper case letter at the first character 
of the word. In addition, all words with only upper case letters and numeric words are kept in the 
buffer. Optionally, a numeric string may be classified and treated as a stop character. The 
following are mandatory word level parsing rules. First, the word following as possessive "s" 
may be deleted. For example, as the sentence "The cat's paw is wet." is parsed in accordance 
20 with the invention, "the" is deleted and "cat" is put into the buffer and then deleted when the 

break character (the aprostrophe) is detected. The apostrophe is deleted because it is punctuation 
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and then the next character to parse is the possessive "s" after the apostrophe which is deleted 
along with the word "paw" since it follows the possessive "s". Connector words appearing at the 
beginning of a phrase are also deleted although a connector word followed by "The" is kept in 
the buffer. For a hard stop character, the last phrase connected to the hard stop character is 
5 deleted and the remaining buffer is processed. A soft stop character may be treated as a break 
character. A repeated character is treated as a stop character. 

To further remove unwanted words for parsing, some optional phrase level parsing rules 
may be used. In particular, phrases longer than a predetermined length, such as six words, may 
;i be deleted, a phrase with all upper case words may be deleted and a phrase with all numeric 
it) words may be deleted. All of the above parsing rules may be stored in the parsing rules database 
!=* 24 shown in Figure 2. Now, the details of the parser system 12 will now be described with 
reference to Figures 3 A and 3B. 

\2 Figure 3 A is a flowchart illustrating a two pass parsing process 30 in accordance with the 

.;5 invention. In particular, during a first pass 40, one or more phrases are extracted from a piece of 
15 text using the hard and soft stop words as described below with reference to Figure 3B. The first 
pass thus extracts noun phrases. For example, if a piece of text includes, "The big frog and the 
kangaroo jumps down.", the first pass extracts the phrase "big frog", but not "kangaroo jumps" 
as described below. During a second pass 41, all extracted phrases are retrieved from the piece 
of text. In particular, the occurrence of each extracted phrase in the piece of text may be 
20 retrieved from the piece of text. For example, assume that a piece of text contains the fragments, 
"The software bugs on. . ." and "software bugs are. . .". The parser in the first step throws away 
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the first occurrence of the term "software bugs" since it is followed by a hard stop, but retains the 
second occurrence since it is followed by a soft stop. To prevent the parser from discarding good 
noun phrases, such as the first occurrence of the term "software bugs", the second pass retrieves 
all occurrences of the extracted phrases from the piece of text so that, for example, both 
5 occurrences of the term "software bugs" are retrieved. Now, the first pass of the method will be 
described in more detail. 

Figure 3B is a flowchart illustrating more details of the phrase extracting step 40 for 
parsing a document in accordance with the invention. The method begins as a first word of the 
k E document is loaded into the buffer from a document database or a memory of the server in step 
I|) 42. Next, the parser determines if the word is a break character in step 44. The parser may also 
delete certain characters or words at this stage of the parsing process. If the word is not a break 
character, the method loops back to step 42 and the next word of the document is read into the 
J~: buffer. This process of reading a word into the buffer is repeated until a break character is 
h encountered so that the buffer contains a sequence of words (a phrase) which has a break 
1 5 character before the sequence of words and a break character after the sequence of words. In this 
manner, the document is parsed into phrases which are separated from one another by break 
characters. 

If a break character is encountered, the parser may determine if the break character is an 
explicit break character in step 46, delete the break character and extract the phrase contained in 
20 the buffer if an explicit break character exists in step 48. The phrase extracted from the buffer 
may be stored in a database for future use. Next, in step 50, the buffer may be flushed to empty 
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the words from the buffer and the buffer may begin loading new words into the buffer in steps 42 
and 44 until another break character is identified. Returning to step 46, if the break character is 
not an explicit break character, the parser determines if the break character is a soft stop word in 
step 52. If the break character is a soft stop word, then the soft stop word is deleted and the 
5 phrase in the buffer is stored in the database in step 54, the buffer is flushed in step 50 and the 
buffer is refilled with new words from the document. If the break character is not a soft stop 
word (i.e., the break character is a hard stop word), the hard stop word and the phrase in the 
buffer are deleted in step 56, the buffer is flushed in step 50 and refilled with new words from the 
% document in steps 42 and 44. In this manner, phrases from the document are extracted in 
if) accordance with the invention using the break characters and the type of break character to 
?P separate the phrases from each other and determine which phrases are going to be saved in the 
ry database. The parser in accordance with the invention does not attempt to analyze each word of 
h the document to identify key phrases as with conventional systems, but does extract phrases from 
M the document more quickly than conventional parsers and with as much accuracy as the 
fl conventional parsers. Now, an example of the operation of the parser in accordance with the 
invention will be described with reference to Figures 4 and 5A - 5L. 

Figure 4 is an example of a document 60 to be parsed by the parsing system in 
accordance with the invention while Figures 5 A - 5L illustrate the operation of the buffer during 
the parsing of the document 60 shown in Figure 4. In this example, the document is a short 
20 electronic news story, but the parser may also extract phrases from any other piece of text. In 
fact, the parser in accordance with the invention may be able to extract phrases from various 
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types of documents at speeds of up to 1 MByte of data per second. The particular story shown 
describes a new "snake-like" robot developed by NEC. Figures 5A - 5L illustrate, in a table 68, 
the operation of the buffer in accordance with the invention on the above story. In particular, a 
first column 70 of the table contains the current word being read into the buffer, a second column 
5 72 contains the determination of the type of word by the parser in accordance with the invention, 
a third column 74 contains the contents of the buffer at the particular time, a fourth column 76 
contains the word index (i.e., the phrases which are being extracted from the document) and a 
fifth column 78 contains comments about the parsing process. 

;2 As shown in Figure 5 A, the first word read into the buffer is a sequence of asterisks at the 

0) beginning of the story which are classified by the parser as a break word (punctuation) and 

deleted from the buffer. The next word is "Computer" which is entered into the buffer since it is 
not a break word and the next word, which is "Select" is also entered into the buffer since it is 
^ also not a break word. Thus, the buffer contains the phrase "Computer Select" as shown in a cell 
h 80. The next word in the document is a comma which is classified as a break character by the 
1 5 parser. Because the break character is punctuation (an explicit break), the words in the buffer are 
saved in the database as shown in the Word Index column 76 and the buffer is flushed. Now, 
new words are read into the buffer and parsed. The next word into the buffer is "October" which 
is a hard stop word because it relates to a date and it is deleted. The next word received by the 
buffer is "1995" which is a break character since it is a number and it is also deleted. The next 
20 word received by the buffer is "COPYRIGHT" which is identified as a stop word because it is all 
capital letters and it is deleted. The next word is "Newsbytes" which is not a break character and 
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is therefore stored into the buffer. The next word is "Inc." which is also stored in the buffer. The 
next word is a period which is a break character so that the buffer contents "Newsbytes Inc." are 
saved into a database as shown in the Word Index column, the break character is deleted and the 
buffer is flushed. 

5 The next two word received by the buffer, which are "1995" and a sequence of asterisks, 

are both break words which are deleted. The next two words received by the buffer are 
"Newsbytes" and "Newsbytes" which are both stored in the buffer. The next word received is 
"August" which is a hard stop word so that the contents of the buffer and the hard stop word is 
S deleted. The next three words received by the buffer are all break characters (i.e., numbers or 
W punctuation) which are deleted. The next word is a word containing a number in a cell 82 which 
H is stored in the buffer, but then deleted when the next character is a break character because the 

buffer only contains a single word. As can be seen in Figures 5B - 5L, the parsing process 
ll continues for the entire document so that a list of key phrases, as shown in the Word Index 
O column 76, are extracted from the document and saved in a database. 

15 In summary, phrases which characterize the document or piece of text may be rapidly 

extracted from the document in accordance with the invention. The invention uses the break 
characters in the document or the piece of text to separate the phrases from each other and to 
extract the key phrases for a document. In the example above, the extracted phrases, such as 
"Newsbytes Inc.", "snake-like robot", "NEC Corporation", "robotically controlled electronic 

20 snake", "disaster relief work" and "world's first active universal joint" permit a person reviewing 
only the key phrases to understand the context of the document without reviewing the entire 
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document The parsing system in accordance with the invention performs the extraction of the 
key phrases more rapidly than any other conventional parsing systems which is important as the 
total amount of textual data and documents available for parsing increases at an exponential rate 
due, in part, to the explosion of the user of the Internet. 

The parser in accordance with the invention may be used to parse documents in various 
different foreign languages with minor modifications to the rules database to reflect changes in 
the characters and changes in the syntax of the language. To better understand this, an example 
of a piece of Japanese text is described along with the resulting Japanese taxonomy. However, 
the invention may be used with a variety of different foreign languages with minor modifications 
to the rules database. 

Figure 6 is an example of a piece of Japanese text 90 while Figure 7 illustrates a list 92 of 
phrases 94 that have been extracted from the piece of Japanese text using the two-pass parsing 
method in accordance with the invention. 

While the foregoing has been with reference to a particular embodiment of the invention, 
it will be appreciated by those skilled in the art that changes in this embodiment may be made 
without departing from the principles and spirit of the invention, the scope of which is defined by 
the appended claims. 
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Claims : 

1 LA system for parsing a piece of foreign language text into one or more phrases 

2 which characterize a foreign language document, the system comprising: 

3 a buffer for reading one or more words from the piece of text into the buffer until a break 

4 character is identified; 

5 a parser for identifying a phrase contained in the buffer, the phrase being a sequence of 

6 two or more words in between break characters; 

the parser further comprising means for determining the type of break character that 

B 8 follows the identified phrase and means for saving a key phrase from the buffer based on the 

.jjp determined type of break character ; 

j|p a database for storing the key foreign language phrases. 

1=4 2. The system of Claim 1 , wherein the buffer further comprises means for flushing 

1*^2 the buffer when the key phrase is stored in the database or the phrase in the buffer is deleted. 

s.-,sia 

Ejl 3 . The system of Claim 1 further comprising a retriever for retrieving all occurrences 

2 of the extracted phrases from the piece of text after the piece of text has been parsed. 

1 4. A method for parsing a piece of text into one or more phrases which characterize 

2 the document, the method comprising: 

3 reading one or more words from the piece of text into a buffer until a break character is 

4 identified; 

5 identifying a phrase contained in the buffer, the phrase being a sequence of two or more 

6 words in between break characters; 
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7 determining the type of break character that follows the identified phrase; and 

8 saving a key phrase from the buffer into a database based on the determined type of break 

9 character. 

1 5. The method of Claim 4 further comprising flushing the buffer when the key 

2 phrase is stored in the database or the phrase in the buffer is deleted. 

1 6. The method of Claim 4 further comprising retrieving all occurrences of the 

2 extracted phrases from the piece of text after the piece of text has been parsed. 

1 7. A system for parsing a piece of text into one or more phrases which characterize a 

S document, the system comprising: 

ISB a buffer for reading one or more words from the piece of text into the buffer until a break 
character is identified; 

1 H; a parser for identifying a phrase contained in the buffer, the phrase being a sequence of 

two or more words in between break characters; 

H7 the parser further comprising means for determining the type of break character that 

c 8 follows the identified phrase and means for saving a key phrase from the buffer based on the 

9 determined type of break character ; 

10 a database for storing the key foreign language phrases; and 

11 a retriever for retrieving all occurrences of the extracted phrases from the piece of text 

1 2 after the piece of text has been parsed. 

1 8. The system of Claim 7, wherein the buffer further comprises means for flushing 

2 the buffer when the key phrase is stored in the database or the phrase in the buffer is deleted. 
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1 9. A method for parsing a piece of text into one or more phrases which characterize 

2 the document, the method comprising: 

3 reading one or more words from the piece of text into a buffer until a break character is 

4 identified; 

5 identifying a phrase contained in the buffer, the phrase being a sequence of two or more 

6 words in between break characters; 

7 determining the type of break character that follows the identified phrase; 

i$ saving a key phrase from the buffer into a database based on the determined type of break 

W character; and 

fO retrieving all occurrences of the extracted phrases from the piece of text after the piece of 

ft text has been parsed. 

«1 10. The method of Claim 9 further comprising flushing the buffer when the key 
phrase is stored in the database or the phrase in the buffer is deleted. 

1 1 1 . A system for parsing a piece of text into one or more phrases which characterize 

2 the document, the system comprising: 

3 a first pass comprising means for identifying a phrase contained in a buffer wherein the 

4 phrase is a sequence of two or more words in between break characters, means for determining 

5 the type of break character that follows the identified phrase and means for saving a key phrase 

6 from the buffer based on the determined type of break character; and 
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7 a second pass comprising means for retrieving all occurrences of the extracted phrases 

8 from the piece of text. 

1 12. A method for parsing a piece of text into one or more phrases which characterize 

2 the document, the method comprising: 

3 performing a first pass through the piece of text, the first pass comprising identifying a 

4 phrase contained in a buffer wherein the phrase is a sequence of two or more words in between 

5 break characters, determining the type of break character that follows the identified phrase and 
3j5 saving a key phrase from the buffer based on the determined type of break character; and 

^7 performing a second pass through the piece of text comprising retrieving all occurrences 

IS of the extracted phrases from the piece of text. 
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AftSTRACT CW THE DISCLOSURE 

A parsing system and method are provided in which the break characters in the document 
are used to rapidly parse the document and extract one or more key phrases from the document 
which characterize the document. The break characters in the document may include explicit 
break characters, such as punctuation, soft stop words and hard stop words. The determination of 
which phrases in the document are extracted depends upon the type of break character appearing 
after the phrase in the document. The parser may also be used to parse a foreign language 
document into one or more phrases. 
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NEC Develops "Snake-Like" Robot 
Author 
Mark, Jeremy 
Full Text 

TOKYO, JAPAN, 1995 AUG 31 (NB) - NEC Corporation has developed a robotically controlled electronic snake 
that offers far more movement that previously designed robots. The company says that the much greater freedom of 
movement "makes it perfect* for everything from industrial to disaster relief work. 

The secret of the new device lies in a revolutionary new type of universal joint Previously joints in the body of robot 
have been restricted to movement in just one plane, either left and right or up and down. But the Tokyo-based 
company says it has succeeded in developing the world's first active universal joint 

Controlled by two motors, the joint allows full freedom of movement in all planes at each of the six joints along the 
robot's body, allowing the unit to crawl into places previously inaccessible. 

At the heart of the robot is a computer processing unit that receives signals from the operator's handset and controls 
movement The controller can instruct the computer to control all the joints in harmony or specify individual control of 
each joint if necessary. 

A video camera at the head of the robot sends signals back to the operator who can use them to steer the unit and also 
to examine places inaccessible to humans. 

The entire device is 1 .4 meters long and measures 42 millimeters in diameter. It weighs 4.6 kilograms. 
The as-yet unnamed device is not yet commercially available, NECs Mark Pearce told Newsbytes. "It will be a couple 
of years before everything in sorted out and it's ready to be sold. We have to increase the speed amongst other things," 
he said. 

NEC says typical applications for such a robot could be investigation of complex pipework or as an aid to search teams 
in disaster hit areas where trie device could crawl through the rubble of collapsed buildings. 

(Martyn Williams/ 19950830/Press contact Mark Pearce, NEC Corporation, tel +81-3-3798-6511, fax +81-3-3457- 
7249, Internet e-mail rriaku_10-22150^addm.necxo.jp/lSIEC950831/PHOTO) 
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DECLARATION AND POWER OF ATTORNEY 



DECLARATION: 



As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name. 

I believe, I am the original, first and sole inventor (if only one name is listed below) or an 
original, first and joint inventor (if plural names are listed below) of the subject matter which is 
claimed and for which a patent is sought on the invention entitled: 

SYSTEM AND METHOD FOR PARSING A DOCUMENT 

the specification of which (check only one item below): 

X is attached hereto. 

_ was filed as United States Application 

Serial No. on 

and was amended on (if applicable). 

was filed as PCT international application 

Number on 

and was amended under PCT Article 19 

on (if applicable). 

I hereby state that I have reviewed and understand the contents of the above-identified 
specification, including the claims, as amended by any amendment referred to above. 

I acknowledge the duty to disclose information which is material to the examination of 
this application in accordance with Title 37, Code of Federal Regulations, §1. 56(a). 

I hereby claim foreign priority benefits under Title 35, United States Code, §1 19 of any 
foreign application^) for patent or inventor's certificate or of any PCT international 
application(s) designating at least one country other than the United States of America listed 
below and have also identified below any foreign applications) for patent or inventor's 
certificate or any PCT international application(s) designating at least one country other than the 
United States of America filed by me on the same subject matter having a filing date before that 
of the application(s) on which priority is claimed: 



PRIOR FOREIGN/PCT APPLICA7 


ION(S) AND ANY PRIORITY 


CLAIMS UNDER 35 U.S. 


C. 119: 


Country 
fff PCT indicate VCT) 


Application Number 


Date 
Filed 


Priority 
Claimed 
(Yes/No) 



















I hereby claim the benefit under Title 35, United States Code, §120 of any United States 
application(s) or PCT international application(s) designating the United States of America that 
is/are listed below and, insofar as the subject matter of each of the claims of this application is 
not disclosed in that/those prior application(s) in the manner provided by the first paragraph of 
Title 35, United States Code, §1 12, 1 acknowledge the duty to disclose material information as 
defined in Title 37, Code of Federal Regulations, § 1.56(a) which occurred between the filing 
date of the prior application(s) and the national or PCT international filing date of this 
application: 



PRIOR U.S. APPLICATIONS OR PCT INTERNATIONAL APPLICATIONS DESIGNATING THE U.S. 
FOR BENEFIT UNDER 35 U.S.C. 120: 



U.S. APPLICATIONS 



STATUS (check one) 



U.S. APPLICATION NUMBER 



U.S. FILING DATE 



PATENTED 



PENDING 



ABANDONED 



09/288,994 



April 9,1999 



X 



PCT APPLICATIONS DESIGNATING THE U.S. 



PCT APPLICATION NO. 



PCT FILING DATE 



U.S. SERIAL NUMBERS 
ASSIGNED (if any) 



POWER OF ATTORNEY: 



As a named inventor, I hereby appoint the following attorney(s) and/or agent(s) with full power 
of substitution to act exclusively to prosecute this application and transact all business in the 
Patent and Trademark Office connected therewith: 

Barry N. Young (Reg. No. 27,744); Timothy W. Lohse (Reg. No. 35,255); Stephen E. Reiter (Reg. No. 31,192); 
Steven R. Sprinkle (Reg. No. 40,825); William N. Hulsey ffl (Reg. No. 33,402); Terrance A. Meador (Reg. No. 
30,298); Ramsey R. Stewart (Reg. No. 38,322); June M. Learn (Reg. No. 31,238); John Oskorep (Reg. No. 
41,234); Timothy N. Ellis (Reg. No. 41,734); William G. Goldman (Reg. No. 42,590); Sheila Kirschenbaum (Reg. 
No. 44,835); Travis L. Dodd (Reg. No. 42,491); Charles D. Gavrilovich, Jr. (Reg. No. 41,031); Gerald W. 
Maliszewski (Reg. No. 38054); Hayward A. Verdun (Reg. No. 43,223); Armando Pastrana, Jr. (Reg. No. 44,997); 
Richard M. Goldman (Reg. No. 25,585) 

All correspondence should be addressed to: 

Timothy W. Lohse 

GRAY CARY WARE & FREIDENRICH 
Patent Department - Hillview 
3340 Hillview Avenue 
Palo Alto, CA 94304 

All telephone calls should be directed to Timothy W. Lohse, telephone number (650) 
320-7400. 



Inventor's Full Name: 


Claude VOGEL 


Inventor's Signature: 




Date: 




Residence: 
(City, State and/or country) 


94140 Alfortville, France 


Citizenship: 


France 


Post Office Address: 


21, Rue Raymond Jaclard 



